Skip to content

TST: Consolidate tests that raise in groupby #50749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

albuzenet
Copy link
Contributor

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Part of #50133. Continue the work of @rhshadrach started in #50404 for the groupby consolidation.

@mroeschke mroeschke requested a review from rhshadrach January 16, 2023 19:15
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is a good step toward consolidating these. However there are many removed tests that are not part of the added tests (and conversely, there are many added tests that were not originally there! That bit is great). I've identified which I believe are missing below, please let me know if you think any are mistaken or would like explanation as to why.

I think if we can get coverage with the following cases, it's be good.

  • Multiple groupings (.groupby(['a', 'b']))
  • observed=False for categorical
  • Series selection after groupby (.groupby(...)['column'])
  • Groupings that aren't in-axis (.groupby([0, 0, 1]))
  • Groupings that are groupers (.groupby(pd.Grouper(...)))
  • NumPy functions (.groupby(...).agg(np.mean))

@mroeschke mroeschke added Testing pandas testing functions or related to the test suite Groupby labels Jan 17, 2023
@albuzenet
Copy link
Contributor Author

Thanks for the review @rhshadrach.

I updated the tests, I believe now it should cover all the cases you mentioned (and more). Wondering if 28k tests for a single file isn't to much though ?

Anyway, I think we are still missing some cases like grouping on the index or grouping on a dtype other than int or category. Not sure if you think it is necessary here?

@albuzenet albuzenet requested a review from rhshadrach February 1, 2023 12:35
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - great work here! For any added tests, can you add # GH#50749 to the top of any new tests.

What's the runtime of this file on your machine?



@pytest.fixture(params=[True, False])
def groupby_serie(request):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: groupby_series


if groupby_serie:
if groupby_func == "corrwith":
pytest.skip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip here?

Copy link
Contributor Author

@albuzenet albuzenet Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the corrwith method is not implemented for SeriesGroupBy. We could test for an AttributeError, but to me it makes more sense to skip it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should assert not hasattr(gb, 'corrwith') and return. By skipping, if corrwith is ever added then this test will always not fail. On the other hand, if you add the assert, then if it is added this test will fail and the author will know to update it.

@albuzenet
Copy link
Contributor Author

albuzenet commented Feb 3, 2023

On my machine, I get :
==== 28501 passed, 420 skipped in 113.16s (0:01:53) ====

Also, do you know why the CI pipeline is failing ? I see some failed tests in pandas/tests, but I don't see the connection with this PR ?

@albuzenet albuzenet requested a review from rhshadrach February 6, 2023 14:53
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I am concerned though with the runtime for this. Can you remove sort and as_index from the tests - I think it's safe to test without these and would reduce the runtime by ~3/4ths.

Also, do you know why the CI pipeline is failing ? I see some failed tests in pandas/tests, but I don't see the connection with this PR ?

This appears unrelated to me too. This is okay to ignore.


if groupby_serie:
if groupby_func == "corrwith":
pytest.skip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should assert not hasattr(gb, 'corrwith') and return. By skipping, if corrwith is ever added then this test will always not fail. On the other hand, if you add the assert, then if it is added this test will fail and the author will know to update it.

@albuzenet
Copy link
Contributor Author

Done. The new runtime on my machine is :
===== 10561 passed in 44.23s =====

@albuzenet albuzenet requested a review from rhshadrach February 9, 2023 15:45
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added this to the 2.0 milestone Feb 9, 2023
@rhshadrach rhshadrach merged commit 8661548 into pandas-dev:main Feb 9, 2023
@rhshadrach
Copy link
Member

Thanks @albuzenet - great work!

@albuzenet
Copy link
Contributor Author

Thanks for the review @rhshadrach !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Groupby Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants